AITopics | training direction

Collaborating Authors

training direction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies

Luo, Xiaoliang, Xu, Xinyi, Ramscar, Michael, Love, Bradley C.

arXiv.org Artificial IntelligenceMay-14-2025

Can autoregressive large language models (LLMs) learn consistent probability distributions when trained on sequences in different token orders? We prove formally that for any well-defined probability distribution, sequence perplexity is invariant under any factorization, including forward, backward, or arbitrary permutations. This result establishes a rigorous theoretical foundation for studying how LLMs learn from data and defines principled protocols for empirical evaluation. Applying these protocols, we show that prior studies examining ordering effects suffer from critical methodological flaws. We retrain GPT-2 models across forward, backward, and arbitrary permuted orders on scientific text. We find systematic deviations from theoretical invariance across all orderings with arbitrary permutations strongly deviating from both forward and backward models, which largely (but not completely) agreed with one another. Deviations were traceable to differences in self-attention, reflecting positional and locality biases in processing. Our theoretical and empirical results provide novel avenues for understanding positional biases in LLMs and suggest methods for detecting when LLMs' probability distributions are inconsistent and therefore untrustworthy.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.08739

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Modify Training Directions in Function Space to Reduce Generalization Error

Yu, Yi, Lu, Wenlian, Chen, Boyu

arXiv.org Artificial IntelligenceJul-25-2023

We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in the total generalization error. Furthermore, We demonstrate that this theoretical framework is capable to explain many existing results of generalization enhancing methods. These theoretical results are also illustrated by numerical examples on synthetic data.

artificial intelligence, generalization error, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.1329

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

5 algorithms to train a neural network

#artificialintelligenceApr-1-2018, 21:09:54 GMT

The procedure used to carry out the learning process in a neural network is called the training algorithm. There are many different training algorithms, with different characteristics and performance. The learning problem in neural networks is formulated in terms of the minimization of a loss function, f. This function is in general, composed of an error and a regularization terms. The error term evaluates how a neural network fits the data set. On the other hand, the regularization term is used to prevent overfitting, by controlling the effective complexity of the neural network.

artificial intelligence, machine learning, neural network, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

5 algorithms to train a neural network

#artificialintelligenceOct-22-2016, 07:51:24 GMT

The procedure used to carry out the learning process in a neural network is called the training algorithm. There are many different training algorithms, whith different characteristics and performance. The learning problem in neural networks is formulated in terms of the minimization of a loss function, f. This function is in general, composed of an error and a regularization terms. The error term evaluates how a neural network fits the data set. On the other hand, the regularization term is used to prevent overfitting, by controlling the effective complexity of the neural network.

artificial intelligence, machine learning, neural network, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mechanisms of Generalization in Perceptual Learning

Liu, Zili, Weinshall, Daphna

Neural Information Processing SystemsDec-31-1999

The learning of many visual perceptual tasks has been shown to be specific to practiced stimuli, while new stimuli require re-Iearning from scratch. Here we demonstrate generalization using a novel paradigm in motion discrimination where learning has been previously shown to be specific. We trained subjects to discriminate the directions of moving dots, and verified the previous results that learning does not transfer from the trained direction to a new one. However, by tracking the subjects' performance across time in the new direction, we found that their rate of learning doubled. Therefore, learning generalized in a task previously considered too difficult for generalization.

discrimination task, sensitivity, training direction, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Mechanisms of Generalization in Perceptual Learning

Liu, Zili, Weinshall, Daphna

Neural Information Processing SystemsDec-31-1999

discrimination task, sensitivity, training direction, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Mechanisms of Generalization in Perceptual Learning

Liu, Zili, Weinshall, Daphna

Neural Information Processing SystemsDec-31-1999

Zili Lin Rutgers University, Newark DaphnaWeinshall Hebrew University, Israel Abstract The learning of many visual perceptual tasks has been shown to be specific to practiced stimuli, while new stimuli require re-Iearning from scratch. Here we demonstrate generalization using a novel paradigm in motion discrimination where learning has been previously shownto be specific. We trained subjects to discriminate the directions of moving dots, and verified the previous results that learning does not transfer from the trained direction to a new one. However, by tracking the subjects' performance across time in the new direction, we found that their rate of learning doubled. Therefore, learning generalized in a task previously considered too difficult for generalization.

artificial intelligence, machine learning, sensitivity, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.24)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback